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Abstract 

This article presents SAWdoubler, a package for counting the total number 
Zn of self-avoiding walks (SAWs) on a regular lattice by the length-doubling 
method, of which the basic concept has been published previously by us. 
We discuss an algorithm for the creation of all SAWs of length N, efficient 
storage of these SAWs in a tree data structure, and an algorithm for the 
computation of correction terms to the count Z2N for SAWs of double length, 
removing all combinations of two intersecting single-length SAWs. 

We present an efficient numbering of the lattice sites that enables ex- 
ploitation of symmetry and leads to a smaller tree data structure; this 
numbering is by increasing Euclidean distance from the origin of the lat- 
tice. Furthermore, we show how the computation can be parallelised by 
distributing the iterations of the main loop of the algorithm over the cores 
of a multicore architecture. Experimental results on the 3D cubic lattice 
demonstrate that Z28 can be computed on a dual-core PC in only 1 hour 
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and 40 minutes, with a speedup of 1.56 compared to the single-core com- 
putation and with a gain by using symmetry of a factor of 26. We present 
results for memory use and show how the computation is made to fit in 4 
Gbyte RAM. It is easy to extend the SAWdoubler software to other lattices; 
it is publicly available under the GNU LGPL license. 
Keywords: self-avoiding walk, enumeration, simple cubic lattice 

Program Summary 

Manuscript title: SAWdoubler: a program for counting self-avoiding walks 
Authors: Raoul D. Schram, Gerard T. Barkema, Rob H. Bisseling 
Program title: SAWdoubler 
Journal reference: 
Catalogue identifier: 

Program obtainable from: CPC Program Library, Queen's University, Belfast, 
N. Ireland; also from http://www.staff.science.uu.nl/~bissel01/SAW/ 

Number of lines of code of program: 1152 
Licensing provisions: GNU LGPL 
Distribution format: tar.gz 
Programming language: C 

Computer: Any computer with a UNIX-like operating system and a C compiler. 
For large problems, use is made of specific 128-bit integer arithmetic provided by 
the gcc compiler. 

Operating system: Any UNIX-like system; developed under Linux and Mac OS 10 
RAM: Problem dependent (2 Gbyte for counting SAWs of length 28 on the 3D 
cubic lattice) 

Number of processors used: 1. Parallel version available in directory Extras. 
Keywords: Self-avoiding walk, Enumeration, Simple cubic lattice. 
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Classification: 7. Condensed matter and surface science 

Nature of problem: Computing the number of self-avoiding walks of a given length 

on a given lattice 

Solution method: Length doubling 

Restrictions: The length of the walk must be even. Lattice is 3D simple cubic. 
Additional comments: The lattice can be replaced by other lattices, such as BCC, 
FCC, or a 2D square lattice 

Running time: Problem dependent (2.5 hours using one processor core for length 
28 on the 3D cubic lattice) 

1. Introduction 

Counting the number of self-avoiding walks on a regular lattice is a fun- 
damental problem in combinatorics and statistical physics. A self-avoiding 
walk (SAW) is a path in a lattice where each step goes from a lattice point 
to an adjacent point in the lattice, and where a previously visited point 
cannot be visited again. The SAW enumeration problem is of importance 
in physics because a SAW can be used to model the conformation of a poly- 
mer, where two monomers are forbidden to occupy the same location (the 
excluded- volume principle). Furthermore, this problem has been a challenge 
to mathematicians and physicists alike, because counting the exact number 
of SAWs is difficult. The number of SAWs of length N grows quickly 
with N, asymptotically as 

Z N » A n N N't' 1 . (1) 

Here, the factor jjl n dominates; it depends on the lattice, e.g. fi ~ 4.68404 
for the 3D cubic lattice. The factor iV 7-1 is a relatively small correction 
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to this, but knowledge of the exponent 7 is very useful since it is a lattice- 
independent (universal) exponent. A straightforward attack on the problem 
that generates all SAWs can only reach a limited length (currently about 
N = 24 for the 3D cubic lattice), because of the large number of SAWs. 
For most lattices, the value of the connective constant /j, is known only 
in approximation, with a few exceptions such as the honeycomb lattice in 
2D, with // = \Jl + y/2 pp. For all regular two-dimensional lattices, the 
exponent 7 is believed (but not proven) to be 72D = 43/32 [3J; its value in 
three dimensions is not known exactly and is estimated at 73^ ~ 1.157. 

The history of counting SAWs goes back at least six decades, to a paper 
by Orr [3] from 1947, who gave the counts Zn,N = 1,...,6, for the 3D 
cubic lattice. The number of steps in an enumeration for this lattice was 
successively increased by Fisher and Sykes [1] , Guttmann [51 16] , MacDonald 
et al. [3|8], and Clisby, Liang, and Slade [9], who reached ./V = 30. Recently, 
we further increased the number of steps to A = 36 by the length-doubling 
method [TO], see Section[TlJ giving Z 36 = 2, 941, 370, 856, 334, 701, 726, 560, 670. 
For the 2D square lattice, the current record is held by Jensen |llj . with 
Z 71 = 4,190,893,020,903,935,054,619,120,005,916. For more detail on 
many aspects of the SAW problem, see the monograph by Madras and 
Slade [TO]. 

The main goal of this article is to present an algorithm and its implemen- 
tation for counting SAWs on a regular lattice, which is based on the length- 
doubling method [TOJ we have published previously. Essentially, this method 
counts the number of SAWs of the double length 2N by taking statistics from 
the 2^ subsets of sites visited by each SAW of length A, thereby reducing the 
computational effort from O (^ 2N ) to O ((2//)^). We also discuss the use 
of symmetry to speed up the computation, and the use of parallelism. Our 

4 



presentation is accompanied by a computer program SAWdoubler, available 
from http : //www . staff . science .uu.nl/~bissel01/SAW71 under the GNU 
LGPL license. The program can in principle handle any regular lattice, and 
provides a sample implementation for the 3D cubic lattice. It is relatively 
straightforward to adapt the program to other lattices, by replacing the pro- 
gram file with functions specifying the lattice, while keeping the file with all 
the counting functions and data structures unchanged. For brevity and ease 
of illustration, we will often use examples from the 2D square lattice in this 
article. 

1.1. Length- doubling method 

A SAW of length A' on a regular lattice starting in the origin can be 
written as a sequence w = (ro, n, . . . , rjv) with ro = 0, meaning that we 
walk from the origin ro to lattice site ri, and so on, until we reach the end 
point r^v- Figure [T] illustrates a walk of length 10 on the square lattice in 



The length-doubling method is based on combining two walks of length 
iV into one walk of length 2N. Let w, w' be two SAWs. We can start a walk 
from the end point rjv of w in the reverse direction of w towards the origin 
and then continue to walk in the direction of the end point r' N of w' . This 
yields a walk of length 2N . If we translate the resulting walk by — r^r, we 
have a walk of length 2N starting in the origin. 

The result of combining two SAWs in this way may be self-avoiding or 
not, depending on the presence of an intersection point r. Let A r be the set 
of pairs of SAWs (w, w') that both pass through the lattice point r. Then 



2D. 




5 



O O © O O O O 



O O 0—6 © O O 



O O O O © © O 



O O O GH © () O 



O O O O 



o 



o o o o o o o 



o o o o o o o 



Figure 1: Self-avoiding walk of length N = 10 on the 2D square lattice. The walk starts 
in the origin (middle of the picture). 



because every pair (w,w') of SAWs of length ./V can be used to construct a 
SAW of length 2N, except if they both pass through a lattice point r. Ap- 
plying the inclusion-exclusion principle from combinatorics |13| to compute 
the number of elements of a union of sets from their intersections yields the 
length-doubling formula 

Z 2N = Zl + Y,(-^Z 2 N {S), (3) 

where S is a subset of the lattice points and Z^{S) the number of SAWs 
that pass through all elements of S. The numbers Zjy(S) can be obtained by 
creating all SAWs of length N (but not those of length 2N) and maintaining 
a bookkeeping of all the possible sets S encountered and their number of 
SAWs Z N (S). 

The implementation of the length-doubling method poses two main chal- 
lenges. First, all sets 5 have to be generated and, because of their large 
number, be stored efficiently or only part of the sets should be stored at the 



same time; the data structure used in our implementation is discussed in 
Sec. [2j Second, the summation over these sets as given in Eq. (|3j) has to be 
performed; this is discussed in Sec. [3j We then also pay attention to how 
symmetry properties of the SAWs can be exploited in Sec. |4j Our imple- 
mentation SAWdoubler is tested with respect to time and memory scaling 
in Sec. 03 We draw conclusions and discuss future extensions in Sec. [H 

2. Storing self-avoiding walks 

Since all SAWs start at the origin, we do not need to store the starting 
point. Furthermore, since the length-doubling method only cares about 
whether walks of length iV intersect, the order of the sites visited in a walk 
is irrelevant. A walk can therefore be written as a set 

W = {r 1 ,...,r N }. (4) 

Note that the same set of points W can result from several different SAWs. 

2.1. Numbering the lattice sites 

The number of lattice sites that can be reached by a SAW of length 
N is finite, and hence the sites can be numbered by a finite numbering <f>, 
irrespective of the dimensionality of the lattice. For the 2D square lattice, 
for instance, only the 2N 2 + 2N + 1 points r = (x, y) with < \x\ + \y\ < N 
can be reached. A suitable numbering could be 0, ... , 2N 2 + 2N. 

The canonical numbering C anon for the 2D square lattice is defined by 
site number s = C anon(^, y) = {x + N)L + y + A, where L = 2 A + 1 is 
the width of the smallest square lattice enclosing all reachable points. This 
leads to a numbering 0, . . . ,L 2 — 1, where not all sites are reachable. In 
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section |4j a different numbering will be presented which facilitates exploita- 
tion of symmetry. Using a numbering, a walk to be stored can be concisely 
represented by 

W = {w%, . . . , wn}, with wi < W2 < • • • < wn. (5) 

Note that the sites of W are now ordered by increasing site number, and 
not by the order in which the sites are visited. 

2.2. Tree data structure 

Our aim is to store all SAWs of length N in a data structure that requires 
as little memory as possible, but still enables operations such as finding all 
subsets S of a particular walk W. We could store all SAWs simply as lists 
of length N, but this would cause a lot of repetition, since SAWs are often 
similar to each other. 

We choose a tree as our data structure, with a special extra site as the 
root, with sites as nodes, and with parent-child relations defined by 

Wi = parent(wi + i), 1 < i < N, (6) 

for each walk W = {w±, . . . , wn}- The parent of w\ is the root. This tree 
data structure is illustrated by Fig. [2] Note that the same site number may 
occur several times in the tree. The tree is constructed by consecutively 
adding the SAWs to be stored, each time checking whether the lower num- 
bered part wi,...,Wi already exists in the tree when adding site u?j. If so, no 
new nodes need to be added for this part. Only when the new walk deviates 
from the tree, new nodes are introduced for the remainder Wi, . . . , wjy of the 
walk. 

At every node of the tree, the following information is stored: 



S 




Figure 2: Tree data structure as used by the SAWdoubler program for storing self-avoiding 
walks of length N = 4 on the 2D square lattice. Each tree node points to its parent. 
The root node is denoted by 0. The site numbering is the same as in Fig. [3] The 
walk w from (0,0) through (0,1), (—1,1), (—1,0), to (—2,0) corresponds to the walk 
w = (0, 8, 17, 13, 29) in this site numbering, and is stored as the set W = {8, 13, 17, 29} in 
the tree. This tree is used for the computation of Zn(S) for all sets of sites S that have 
29 as their highest site number. Only walks that contain 29 are stored, and only their 
sites s < 29. 
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• site, site number of the node; 

• count, number of SAWs with this node as highest site; 

• child, first child; 

• sibling, next sibling; 

• parent, parent; 

• stamp, a time stamp (not used while building the tree); 

• next, next node with the same site number (not used while building 
the tree). 

The variable site can be stored using a standard (32-bit) integer, as site 
numbers remain small, growing for instance as 0(N 2 ) for the 2D square 
lattice. The variable count initially (i.e., immediately after building the 
tree) contains the count of the number of SAWs with this node as highest 
site. If all walks have the same length N, the initial count is nonzero only at 
the leaves of the tree. The initial counts are modified during the computation 
by adding counts together so that the largest counts in the tree thus may 
become of order 0{Zn). Therefore, the variable count needs a 64-bit integer. 
Different walks visiting the same sites, but in a different order, will have the 
same set W , and hence the initial count can be larger than one. To enable 
storing walks of different lengths in the same tree, the variable count is also 
present in nonleaf nodes. In the length-doubling method, counts Zn(S) are 
squared, cf. Eqn. ([3]), and hence a few extra long (128-bit) integer variables 
such as Z2N must be used in order to match the size of the counts, but such 
variables are not needed in the nodes of the tree. 
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The variables child, sibling, and parent are needed for traversing the 
tree. They point to other nodes in the tree and are set to a dummy if no 
respective child, sibling, or parent exists. Finding the parent of a node is 
an immediate 0(1) operation. Finding all children requires finding the first 
child using child, and then following the linked list of siblings implemented 
by sibling. In our implementation, the siblings are ordered by increasing site 
number, which yields a rather modest savings in computation time when 
processing a new sibling. The savings are obtained in case the new sibling 
is already present in the sibling list; otherwise, the list has to be searched 
until the end. Ordering by increasing site number gives preference to lower- 
numbered sites, and these are closer to the origin and hence have more likely 
been encountered already. 

Two variables stamp and next are added to the node to facilitate opera- 
tions of the counting algorithm, Algorithm [2j see Section 3.2 The variable 



stamp represents a time stamp, which records when we pass a certain node 
while traversing the tree in the counting algorithm. This variable needs a 
64-bit integer for storage. Sometimes, we need to connect a set of nodes in 
the tree with the same site number s into a linked list. This list is imple- 
mented using the variable next. The total required storage per node is one 
32-bit integer and six 64-bit integers, which amounts to 52 bytes per tree 
node. 

The variables count, stamp, and next may change during the counting 
algorithm, but the tree structure as defined by site, child, sibling, parent 
remains the same after the tree has been built by the SAW-creating algo- 
rithm, Algorithm [TJ see Section 3.1 After the tree has been created, we 



will only use the variable parent, and the variables child and sibling are not 
used any more; in contrast, stamp and next are not used during creation of 
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the SAWs. Therefore, some space can be saved by storing child and stamp 
in one field, and the same for sibling and next. In our exposition, we will 
use the original field names, but in our program SAWdoubler, we save the 
memory of two 64-bit integers per node, reducing the required size for the 
tree to 36 bytes per node. 

The width and the depth of the tree are influenced by the numbering of 
the lattice sites. A careful numbering will limit the number of children of 
each node, especially near the root, and this will enhance the reuse of initial 
parts of walks in the tree. A suitable way to do this is to number the sites 
by increasing Euclidean distance from the origin. For the 2D square lattice, 
this limits the number of children of the root to four, whereas an arbitrary 
numbering could have a much larger number of children and hence would 
lead to little reuse. 

3. Algorithms 

3.1. Creating self-avoiding walks of length N 

Algorithm[T]gives the function Go, which creates all SAWs of length N by 
recursively exploring all unvisited adjacent lattice sites of the current site rj. 
When a SAW of length N has been created, it is converted to site numbers, 
sorted in increasing order, and inserted into the tree data structure. The 
walk (ro, ri, . . . , rj) is stored in the array R, with R[j] = rj for j = 0, . . . , i. 
The initial call of the function is Go(0, N, R, visited, Tree), where the whole 
array visited has been initialised to false, and the tree contains only the 
special root node. 
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Algorithm 1 Recursive algorithm for creating all Zn walks of length N 
1: function Go(i, N, R, visited, Tree) > Extend (ro, ri, . . . , rj) to length 

N 



2: visited(n) <- TRUE 

3: if i = N then 

4: for j ; = 1 to N do 

5: Wj <- 4>(rj) > apply numbering 

6: Sort(W, iV) > sort in increasing order 

7: Insert (W, Tree) 

8: else 

9: for all r G ylrfj( r «) do > visit all neighbours of 

10: if not visited(r) then 

11: r i+ i <- r 

12: Go(z + 1, N, R, visited, Tree) 

13: visited(ri) 4- FALSE 
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3.2. Calculating correction terms 

Algorithm [2] gives the function Correct, which calculates all correction 
terms (-1)\ S \Z N (S) 2 of SAWs of length N passing through a set S of lattices 
sites, by recursively expanding the set S to a superset S' . The initial call 
of the function is Correct( Tree, N, 0, 5ms, 0), where Tree has been filled by 
Algorithm [T] with all SAWs of length N. The algorithm works as follows. 

To expand the current set S, the algorithm first finds the maximum site 
number s max for the active tree nodes. A tree node is called active if its 
walk count contributes to the computation of the current Zn{S). To access 
all active nodes with the same site number, the algorithm uses a bin data 
structure. This structure stores the active nodes with site number s together 
in a bin Bins[s]; each bin is implemented as a linked list. At the start of 
the whole computation, all nodes with a nonzero count are active. Active 
nodes have the current time of the algorithm as a time stamp. Use of such 
a global clock makes it easy to render many nodes inactive by just updating 
the time variable. The variable time equals the number of different sets S 
created so far. 

As a first contribution, the set S, which does not contain s max , is ex- 
panded by smaller sites than s max . Let v be an active node with site number 
■Smax- If its parent pv is already active, the count of v must be added into 
that of pv, in order to give the total number of walks that pass through all 
sites of S and have the path from the root to pv as their lowest-numbered 
part. If the parent is not active, its count should simply be replaced by that 
of v and it will become active. After that, the function Correct is recursively 
called to handle all supersets S' D S with s max S' . The result is added to 
Z, with a positive sign since the size of S is unaltered, cf. the sign (— l)' 5 ' 
in Eqn. ([3]). Following the call, all node counts are restored to the situation 

14 



at the start of the function, using an undoing mechanism, details of which 
we omit for the sake of brevity. 

As a second contribution, the set S is expanded by smaller sites than 
s max and also s max itself is included. All walks that do not contain s max 
must now be discarded, which is done by incrementing the time, emptying 
the bins of active nodes, making the parents pv active, inserting them into 
bins, and stamping them with the new time. Also here, the function Correct 
is recursively called, but now the result is subtracted as the sign (— l)' 5 ' has 
changed, due to the expansion of S by one site. In our implementation, we 
also use a time stamping mechanism for the bins, making emptying all bins 
a cheap operation. 

Finally, we collect and sum the squares of the counts for the case where 
•Smax is the final site added to S, i.e., the site with minimum site number of 
S, and the set S is not expanded further. 

4. Exploiting symmetry 

For the 2D square lattice, the number of SAWs that end in a point 
(x, y) is the same as the number ending in (— x, y) because of symmetry, 
and similarly it is the same as the number for (x,—y), (—x,—y), (y,x), 
(—y,x), (y,—x), and (— y, — x). Thus, we have 8-fold symmetry which we 
should exploit for an efficient computation of Zjy. For the 3D cubic lattice, 
the potential gain is even larger, since we have 48-fold symmetry, obtained 
by composing the 8 reflections (=tx, ±y, ±z) with the 6 permutations of the 
variables x, y, z. 

The symmetry operations of a lattice form a group G, where every sym- 
metry operation Q € G has an inverse symmetry operation Q" 1 G G, and 
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Algorithm 2 Recursive algorithm for calculating the correction terms 



{-1)\ S \Z N {S) 2 for all sets S 



1: 


function Correct( Tree, N, S, Bins, time) 


> Correction for all S" D £ 


2: 


Z <- 




3: 


s max max{s : Bros[s] 7^ 0} 


> maximum site number 


4: 


if s max = root then 




5: 


return Z 




6: 
7: 


> Contribution for S" D S 1 with s max ^ 5' 




8: 


for all i> e Bms[s mal ] do 


> all nodes in bin 


9: 


= parent(v) 




10: 


if stamp(pv) = time then 


> parent is active 


11: 


count(pv) count(pv) + count(v) 




12: 


else 




13: 


count(pv) count(v) 




14: 


Insert (pv, Bins) 


> insert at header in bin 


15: 


stamp (pv) <— time 




16: 


Z ^ Z + Correct(Tree, N, S, Bins, time) 




17: 


Restore the counts 




18: 






19: 


> Contribution for S"DSU {s max } 




20: 


time <— time + 1 


> include s max in new S 


21: 


for s = to s max — 1 do 




22: 


5ms [s] <r- 


> empty the bins 


23: 


for all w e 5ms[s max ] do 




24: 


= parent(v) 




25: 


count(pv) <— count(v) 




26: 


Insert (jw, Bins) 




27: 


stamp(pv) time 




28: 


Z ^ Z — Correct^ Tree, N, S U {s max }, 5zns, time) 


29: 


Restore the counts 




30: 
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31: 


> Contribution for S" = 5 U {s max } 




32: 


for all v € 5ins[s max ] do 


> Smax is final site of S 


33: 


Z «- Z + count(v) 2 




34: 


return Z 





where there is an identity operation I £ G, and the operations are associa- 
tive. In general, the group need not be commutative. We denote the order, 
i.e. the number of elements, of group G by g = \G\. For the 2D square 
lattice, the group is isomorphic to the group of signed 2x2 permutation 
matrices, and its order is 8. 

For a given lattice point r, the symmetry operations that leave it invari- 
ant form a subgroup H r of G, defined by 

H r = {Q £ G : Qr = r}. (7) 

By Lagrange's theorem [TJ], the order h r of the subgroup divides the order 
g of G. Furthermore, the symmetry number of r, defined as 

Symm(r) = \{Qr : Q £ G}\, (8) 

satisfies 

Symm(r) ■ h r = g. (9) 

Thus, the symmetry number of a lattice point for the 2D square lattice 
is a divisor of g = 8. For the 3D cubic lattice, it is a divisor of 48; this 
means that up to 48 different lattice points can be obtained by symmetry 
operations executed on r. We call these points symmetrically equivalent or, 
for short, equivalent. Together, these points form an equivalence class 

[r] = {Qr : Q £ G}. (10) 

To exploit the symmetry, the numbering should make it easy to deter- 
mine whether two lattice points are equivalent. This can be achieved by 
numbering the points from the same equivalence class within a range of g 
numbers, from kg to (k + l)g — 1, for a certain k. There may be less than 
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g numbers from the range that are actually used. To check whether sites s 
and s' are equivalent, we just need to divide by g and round down: 



s ~ s 



(11) 



.9. 

Figure [3] shows a numbering that respects the symmetry for the 2D square 
lattice. 

Let Qs denote the site obtained from site s by applying symmetry op- 
eration Q, and QS the set of sites obtained from set S by applying Q to 
the sites of S. Note that \QS\ = \S\, because Q is a bijection. Similar to 
Eqn. ^ for a single lattice point, we can define the symmetry number of a 
set of sites S, 

Symm(S) = \{QS : Q € G}\. (12) 

We can order sets of the same size lexicographically, by comparing the 
highest site numbers first. For example, the set {2, 4, 7} is lexicograph- 
ically smaller than {3,5,7}, because we first compare the highest sites 
and find that 7 = 7, and then we find that 4 < 5. We denote this by 
{2,4,7} < lcx {3,5,7}. 

Our aim is to compute Zn(S) for every subset S of lattice sites that 
occurs in a walk of length N. Let S = {s±, . . . , s n } be such a subset, with 
s\ < S2 < ■ ■ ■ < s n and 1 < n < N. We call the highest site s n the terminal 
site of S. Note that this is not necessarily the end point of a walk through 
S. We can write 

S=(S\S)US, (13) 

where 

S = {s£ S : s ~ s n } (14) 
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Figure 3: Site numbering of the 2D square lattice for self- avoiding walks of length N = 4 
as produced by the SAWdoubler numbering function. Only reachable sites are numbered. 
Lattice points (a;, y) with < x < y (the grey area) are numbered first in their equivalence 
class, and their site numbers are a multiple of g = 8. Numbering of these points is by 
increasing the Euclidean distance from (0, 0). Lattice point (1, 2) has site number 32 and 
symmetry number Symm(32) = 8. Its equivalence class consists of the sites 32-39. Lattice 
point (0, 1) has site number 8 and symmetry number Symm(8) — 4. Its equivalence class 
consists of the sites 8, 10, 12, 13. There are no sites 9, 11, 14, 15 in this numbering. 
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is the terminal part of S, which is the set of sites equivalent to its terminal 
site. Checking whether s £ S belongs to S, reduces to checking whether 



s > 



Sri 
. 9 . 



(15) 



It is easy to prove that 

QS = Q(S\S)UQS, (16) 
as a disjoint union, and that 

QS = QS, (17) 

for all subsets 5 and all Q £ G. 

For a set of sites S, we can find an operation Qs £ G such that QsS is 
lexicographically the largest among the sets QS. The operation Qs is not 
unique, but the set QsS is. Since Z^{S) = Zjy(QsS), we need not compute 
Zn(S), but we can compute Z^{QsS) instead. This means that we only 
have to compute Zn(S) for sets S with S >i ex QS for all Q G G. 

For every lexicographically largest set S, there are Symm(S) symmetry 
operations Q £ G that lead to different sets QS. These operations also give 
different sets QS, because their terminal parts QS = QS are different. Note 
that for the same reason, we have 

Symm(S) < Symm(S). (18) 

We now just have to multiply Zn(S) by the symmetry number Symm(S) 
to account for all the omitted sets S; this symmetry number is most easily 
computed by using 

Symm(S) = — , (19) 
s 

similar to Eqn. ([£]). 
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This method fully exploits the available symmetry of S, and asymptoti- 
cally for large N this will approach the full symmetry of S, as in most cases 
the terminal part will only contain one site and it will have the maximum 
symmetry number, g. Furthermore, the overhead caused by computing the 
symmetry number is limited, as we only need to compute it for every S, and 
not for every S. When expanding S, i.e. when adding a new, smaller site 
■Smax) we compute the symmetry number if we leave the equivalence class 
of the terminal site (i.e., s max ^ s n ). From then onwards, S cannot change 
anymore, and we use its value for all S with the same terminal part S. It 
would also be possible to exploit the full symmetry of S instead of only that 
of S, but this would yield only limited gain and would cause some extra 
overhead. 

5. Experimental results 

In this section, we will test the performance of the SAWdoubler pro- 
gram both with respect to computation time and memory. In our previous 
work [10J , we used 200 processing cores of a supercomputer and spent about 
50,000 core hours for the computation of Z^q for the 3D cubic lattice. In 
the present work, we will focus instead on the performance on a PC with a 
limited amount of memory. Our test case is the same 3D cubic lattice. 

The test architecture we use is a dual-core Apple MacBook Pro with 
a 2.53 GHz Intel Core i5 dual-core processor and 4 GB RAM, a 256 KB 
L2-cache per core, a 3 MB L3-cache, and a 5400 rpm hard disk of size 500 
GB, running the MacOs 10.6.8 operating system. We use the gcc compiler, 
version 4.2, with flags -03 -Wall. 

SAWdoubler first creates SAWs of length N by Algorithm [T] and then 
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computes the correction terms (— 1)^Zn{S) by Algorithm^ The computa- 
tion of values Z^{S) for sets S has been organised such that all sets S with 
the same terminal site t are handled by a separate tree. A SAW w of length 
N is then only stored if it contains t, and only the sites s < t of the walk 
are stored in the tree; thus stored walks may have length less than N. The 
main program then loops over t up to the maximum reachable lattice site. 
This procedure saves much memory, and makes it possible to reach larger 
N. We call this the split-tree approach. If desired, the single site t could be 
replaced by a set of sites T, to reduce memory requirements further. 

Table [T] presents the computation time needed for calculating Z2N, for 
iV = 7, . . . , 14. The time given is the total elapsed time of a single run as 
measured by the Unix time utility. (For A < 6, the time needed is too 
short and our measurement becomes inaccurate; therefore we omit those 
results.) In almost all cases, the elapsed time is close to the used CPU time. 
Comparing columns in the table without and with symmetry shows that 
exploiting symmetry considerably accelerates the computation, by up to a 
factor of 26.2 for N = 14. 

We use two different numberings in our experiments for Table [TJ Chang- 
ing the numbering from ordering by the Euclidean norm to ordering by the 
Manhattan norm (||x||i = ^ saves up to a modest 5 per cent in time 
for A < 13 but it takes about 10 per cent more memory. This becomes a 
disadvantage for A = 14, where the amount of memory required is close to 
the total amount available. Both numberings order the lattice points by an 
increasing distance from the origin, given by the respective norm, and thus 
perform much better than other numberings (that we used in our initial 
implementations of SAWdoubler.) 

The last column of Table [T] represents an attempt to use the full com- 
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puting capability of the dual-core architecture by employing both cores in 
parallel. This is done by running two instances of the program simultane- 
ously, one executing the odd iterations of the main loop, and the other the 
even ones. This already gives a speedup of around 1.7 for N = 9-13 and the 
L2 norm. The load imbalance here is reasonable, with the largest imbalance 
(3.5 per cent above the average time) observed for N = 13, one core running 
687 s and the other 736 s. Both cores use the same shared memory, so they 
may hinder each other and both must store a complete tree in memory. For 
N = 14, the trees become very large, and together they fill up about two 
thirds of the available RAM memory. Here, the CPU time was about 10 
per cent less than the elapsed time, perhaps caused by cores interfering with 
each other when making use of shared resources such as the RAM and the 
L3 cache. This difference between CPU time and elapsed time only occurred 
for the largest problem instance N = 14. The resulting speedup for N = 14 
is 1.56 out of 2. 

The dual-core approach can be generalised to more cores by cyclic as- 
signment: processor core c = 0, . . . ,p— 1 from a set of p cores will carry out 
iterations c, c + p,c + 2p, ... of the main loop. For larger N and a larger 
number of cores p, this static distribution of work by cyclic assignment may 
lead to larger imbalance than that observed for two cores, in particular since 
the amount of work may then vary considerably between loop iterations. In 
that dynamic distribution of work based on a job queue would lead 

to better balance. 

Considering the growth of the computation time with increasing N, we 
note that moving from N = 12 to N = 13 increases the time by a factor 
of 7.1 (using symmetry and the L2 norm), and moving from N = 13 to 
N = 14 by a factor of 7.4. Asymptotically, the length-doubling method 
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grows as C((2/x) ) ~ 0(9.368 ), since every one of the = 0(fi ) walks 
of length N has 2^ different subsets S and incrementing a counter for each 
of these costs 0((2[i) N ) operations. The memory savings of the tree by 
eliminating repetition also pays off in computation time, as counters may 
now be incremented by larger numbers than one. This results in slower 
initial growth than the factor of 9.368 theoretically predicted. 

For comparison, we also used our program to compute in a straigh- 
forward way, without length-doubling and without using symmetry, just by 
creating and counting all Zn walks. The computation of Z17 in this manner 
already took 9258 s, about the same time as the computation of Z28 with 
length-doubling and symmetry. 

Examining the breakdown of the computation time, we observed that 
for large N by far most of the time is spent in computing the correction 
terms by Algorithm [2j A notable 25 per cent of that time is spent in finding 
the largest remaining site s max using the bin structure, and the remainder 
in traversing the tree. Finding s max can possibly be optimised in the future, 
perhaps by using some form of hashing, as many bins will be empty. 

Table [2] presents the memory requirements of the SAWdoubler program 
for the split-tree approach, and for comparison also for the approach where 
the tree is not split, i.e. the single-tree approach. These requirements should 
be compared with Zjv, the number of SAWs w of length N, and also with the 
related number Z' N of sets of sites W obtained from the SAWs, i.e. ignoring 
the walk order; the value of Z' N does not depend on the chosen numbering. 
It holds trivially that Z' N < Zfq. Comparing the single-tree storage with its 
lower bound Z' N and its upper bound NZjy, we observe that the storage is 
within a range of 1.82-2.09 times the lower bound, and that it is far from 
the upper bound. Using a tree thus saves a lot of memory. The number Z' N 
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N No symmetry Symmetry Symmetry Symmetry 
L2 norm L2 norm L\ norm L2 norm 
1 core 1 core 1 core 2 cores 



7 


0.210 


0.016 


0.020 


0.017 


8 


1.42 


0.091 


0.089 


0.062 


9 


9.70 


0.534 


0.518 


0.316 


10 


69.8 


3.49 


3.35 


1.96 


11 


530. 


24.5 


23.3 


14.1 


12 


4110. 


177. 


169. 


102. 


13 


30990. 


1259. 


1213. 


736. 


14 


244235. 


9331. 


9417. 


5976. 



Table 1: Time (in s) of the computation of the number of SAWs of length 2N for the 
3D cubic lattice. Exploitation of 48-fold symmetry is either disabled or enabled; the site 
numbering is based on either the L2 (Euclidean) norm or the L\ (Manhattan) norm; and 
either one or two cores of the dual-core processor are used. 
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is easily obtained by counting the leaves of the single tree, as each set W 
must have its own leaf in the tree data structure. 

Splitting the tree, by only storing walks with a particular terminal site 
t, further reduces memory, by up to a factor 109 for N = 11, and makes 
computations possible for N = 12, 13, 14 that are otherwise infeasible. Mem- 
ory usage can be reduced by another factor of at least 1.4 by deleting the 
terminal site itself from the tree; this means deleting one layer of leaves, 
e.g. deleting node 29 everywhere in Fig. [2] Since we use 36 bytes per node 
for storing the tree itself, and another 16 bytes per node for the undoing 
mechanism, we need a total of 2.0 GB storage for N = 14. 

6. Conclusion and future work 

In this article, we have presented an algorithm for counting the num- 
ber of self-avoiding walks of length 2N by creating self-avoiding walks of 
length N, based on the length-doubling method [10]. We have made avail- 
able a program SAWdoubler in C, which implements the method, exploits 
symmetry, and uses an efficient data structure. 

We have computed Z 28 = 12, 198, 184, 788, 179, 866, 902 for the 3D cubic 
lattice on a dual-core laptop computer with 4 GB main memory in 1 hour 
and 40 minutes, and thereby demonstrated the efficiency of our program. 
We have verified the counting results up to N = 28 of our previous work |10j , 
which was done by a completely different implementation. Furthermore, we 
have shown that two processor cores of a dual-core processor can be used 
with a speedup of 1.7, provided two copies of the problem tree fit into the 
shared memory. 

The design of the SAWdoubler program makes it easy to extend the com- 
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N 


Zn 


N- Z N 


Nodes 


Leaves 


Max nodes 


Max nodes 








single tree 


single tree 


split tree 


split tree 










— 7' 


with term. 


w/o term. 


7 


81390 


569 730 


129 846 


71498 


2 672 


1692 


8 


387966 


3103 728 


643 824 


341421 


11927 


7886 


9 


1853 886 


16684974 


3150 431 


1601052 


38 205 


25 981 


10 


8 809 878 


88 098 780 


15 367 644 


7 596 096 


183 532 


122 983 


11 


41934150 


461275 650 


74 587 922 


35 616 048 


682 590 


465 637 


12 


198 842 742 


2 386112 904 






2 854104 


1969 834 


13 


943 974 510 


12 271668 630 






11961303 


8 234139 


14 


4468 911678 


62 564 763 492 






54177636 


37849 701 



Table 2: Memory usage (in number of nodes) during the computation of the count Z2N 
of SAWs of length 2N for the 3D cubic lattice. The value N ■ Zn is the storage needed if 
every SAW of length N would be stored separately in an array of length N. The single-tree 
columns give the storage (number of nodes) for one tree storing all the SAWs of length 
N and also the number of leaves. The split-tree columns give the maximum number of 
nodes of a tree in case a separate tree is used for all sets S with the same terminal site, 
with or without the terminal site stored. Site numbering is by the Euclidean norm. 
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putation to other lattices. Anyone can replace the file lattice, c (aimed at 
the 3D cubic lattice) by another file such as for the 2D square or honeycomb 
lattice, the 3D BCC or FCC lattice, or the 4D hypercubic lattice, which is 
straightforward to do, and no change in the tree structure functions of the 
file sawdoubler.c is needed, nor changes in the main file saw.c. 

For future work, the software can be extended to compute Z2N+X as 
well as Z2N , and to compute squared end-to-end distances 1 1 rjv — ro 1 1 2 , as 
has been done in |10| . A limitation of the present software is the size of 
the tree for one terminal site t. Generalising to a terminal set T to keep 
the tree within any amount of available memory would be the next step. 
Future research could investigate variants of the present problem, such as 
self-avoiding polygons and lattices with forbidden regions. The present work 
should provide an efficient and extendible basis for such investigations. 
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